Legacy Research Web Collections
Research related collections of digital content on the web which are now outdated and/or no longer actively maintained. This can include software and published or unpublished source code. |
||
Digital Species: Web, Research Outputs |
Trend in 2023: No Change |
Consensus Decision |
Added to List: 2019 |
Trend in 2024: No Change |
Previously: Critically Endangered |
Imminence of Action Action is recommended within twelve months, detailed assessment is a priority. |
Significance of Loss The loss of tools, data or services within this group would impact on people and sectors around the world. |
Effort to Preserve | Inevitability Loss seems likely. By the time tools or techniques have been developed, the material will likely have been lost. |
Examples Academic and institutional websites from the first decade of the web containing details of research projects and interests as well as research data. |
||
‘Practically Extinct’ in the Presence of Aggravating Conditions Inaccessible to web archive; bespoke code; insufficient documentation; uncertainty over IPR or the presence of orphaned works. |
||
‘Endangered’ in the Presence of Good Practice Secured by web archive; documentation and rights information published alongside material. |
||
2023 Review This entry was added in 2019. While there are overlaps with ‘Semi-Published Research Data’ and ‘Unpublished Research Data’ entries, it is a separate entry to distinguish between ‘current’ and ‘legacy’ collections with different risk profiles. In 2020, the fact that materials of legacy web collections were no longer actively maintained increased the risk classification to Critically Endangered. The 2021 Jury agreed with these distinctions, adding that loss has already occurred and future loss can be prevented through approaches such as web archiving and code preservation. They identified a 2021 risk toward greater risk based on noted security issues posed by hosting legacy technology software and services which prompted disposal of content imminently without adequate review or selection. The 2022 Taskforce agreed with this assessment, noting no change to the trend (it remained on the same basis as before). The 2023 Council agreed with the Critically Endangered classification with risks remaining on the same basis as before (‘No change’ to trend) but also noted a greater inevitability of loss compared to previous reviews. Additionally, the Council recommended that a received nomination for an entry, on unpublished digital indices and transcriptions in the DIMEV Open-Access Digital Edition of the Index of Middle English Verse, would provide a valuable example to this entry rather than as a new, standalone entry. The 2023 Council additionally recommended that the next major review considers rescoping the entry, possibly splitting this entry into separate areas to assess different levels of risk relating to published and unpublished source code in legacy research web collections. |
||
2024 Interim Review These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend). |
||
Additional Comments These collections are valuable but lose funding and care as institutions re-configure their tasks and individuals retreat from tasks due to retirement or (as volunteers) to old age. There are an endless number of legacy research web resources out there that people don’t know about. Not necessarily a technical challenge but a resource challenge The Internet Archive and other national web archiving bodies have copies of a lot of websites that would fit into this category but by no means all. There’s also a distinction between the software or code used to deliver the user experience and the data. Such code is secondary to the content. This issue can be intensified by the legacy IT Infrastructure in cases where much of the content is hosted there, as security concerns may lead to disposal of content imminently. In these scenarios, their imminence of action becomes more urgent given the security issues posed by hosting legacy technology/software/etc. Case Studies or Examples:
|